Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso
نویسندگان
چکیده
We design differentially private algorithms for statistical model selection. Given a data set and alarge, discrete collection of “models”, each of which is a family of probability distributions, the goal isto determine the model that best “fits” the data. This is a basic problem in many areas of statistics andmachine learning.We consider settings in which there is a well-defined answer, in the following sense: Suppose thatthere is a nonprivate model selection procedure f , which is the reference to which we compare ourperformance. Our differentially private algorithms output the correct value f(D) whenever f is stableon the input data set D. We work with two notions, perturbation stability and subsampling stability.We give two classes of results: generic ones, that apply to any function with discrete output set; andspecific algorithms for the problem of sparse linear regression. The algorithms we describe are efficientand in some cases match the optimal nonprivate asymptotic sample complexity.Our algorithms for sparse linear regression require analyzing the stability properties of the popularLASSO estimator. We give sufficient conditions for the LASSO estimator to be robust to small changesin the data set, and show that these conditions hold with high probability under essentially the samestochastic assumptions that are used in the literature to analyze convergence of the LASSO.
منابع مشابه
Differentially Private Model Selection via Stability Arguments and the Robustness of the Lasso
We design differentially private algorithms for statistical model selection. Given a data set and a large, discrete collection of “models”, each of which is a family of probability distributions, the goal is to determine the model that best “fits” the data. This is a basic problem in many areas of statistics and machine learning. We consider settings in which there is a well-defined answer, in ...
متن کاملIncreasing the Capacity and PSNR in Blind Watermarking Resist Against Cropping Attacks
Watermarking has increased dramatically in recent years in the Internet and digital media. Watermarking is one of the powerful tools to protect copyright. Local image features have been widely used in watermarking techniques based on feature points. In various papers, the invariance feature has been used to obtain the robustness against attacks. The purpose of this research was based on local f...
متن کاملFeature Selection in Big Data by Using the enhancement of Mahalanobis–Taguchi System; Case Study, Identifiying Bad Credit clients of a Private Bank of Islamic Republic of Iran
The Mahalanobis-Taguchi System (MTS) is a relatively new collection of methods proposed for diagnosis and forecasting using multivariate data. It consists of two main parts: Part 1, the selection of useful variables in order to reduce the complexity of multi-dimensional systems and part 2, diagnosis and prediction, which are used to predict the abnormal group according to the remaining us...
متن کاملA hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts
High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013